NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

Yuan, Wentao; Murali, Adithyavairavan; Mousavian, Arsalan (November 2023, 7th Annual Conference on Robot Learning)
Tan, Jie; Toussaint, Marc (Ed.)
With the advent of large language models and large-scale robotic datasets, there has been tremendous progress in high-level decision-making for object manipulation [1, 2, 3, 4]. These generic models are able to interpret complex tasks using language commands, but they often have difficulties generalizing to out-of-distribution objects due to the inability of low-level action primitives. In contrast, existing task-specific models [5, 6] excel in low-level manipulation of unknown objects, but only work for a single type of action. To bridge this gap, we present M2T2, a single model that supplies different types of low-level actions that work robustly on arbitrary objects in cluttered scenes. M2T2 is a transformer model which reasons about contact points and predicts valid gripper poses for different action modes given a raw point cloud of the scene. Trained on a large-scale synthetic dataset with 128K scenes, M2T2 achieves zero-shot sim2real transfer on the real robot, outperforming the baseline system with state- of-the-art task-specific models by about 19% in overall performance and 37.5% in challenging scenes where the object needs to be re-oriented for collision- free placement. M2T2 also achieves state-of-the-art results on a subset of language conditioned tasks in RLBench [7]. Videos of robot experiments on unseen objects in both real world and simulation are available on our project website https://m2-t2.github.io.
more » « less
Full Text Available
ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting

Kedia, Kushal; Dan, Prithwish; Bhardwaj, Atiksh; Choudhury, Sanjiban (November 2023, PMLR)
Tan, Jie; Toussaint, Marc; Darvish, Kourosh (Ed.)
Full Text Available
Rearrangement Planning for General Part Assembly

Li, Yulong; Zeng, Andy; Song, Shuran (November 2023, Proceedings of the 7th Conference on Robot Learning)
Tan, Jie; Toussaint, Marc; Darvish, Kourosh (Ed.)
Most successes in autonomous robotic assembly have been restricted to single target or category. We propose to investigate general part assembly, the task of creating novel target assemblies with unseen part shapes. As a fundamental step to a general part assembly system, we tackle the task of determining the precise poses of the parts in the target assembly, which we term “rearrangement planning". We present General Part Assembly Transformer (GPAT), a transformer-based model architecture that accurately predicts part poses by inferring how each part shape corresponds to the target shape. Our experiments on both 3D CAD models and real-world scans demonstrate GPAT’s generalization abilities to novel and diverse target and part shapes.
more » « less
Full Text Available
Im2Contact: Vision-Based Contact Localization Without Touch or Force Sensing

Kim, Leon; Li, Yunshuang; Posa, Michael; Jayaraman, Dinesh (November 2023, Proceedings of Machine Learning Research)
Tan, Jie; Toussaint, Marc; Darvish, Kourosh (Ed.)
Contacts play a critical role in most manipulation tasks. Robots today mainly use proximal touch/force sensors to sense contacts, but the information they provide must be calibrated and is inherently local, with practical applications relying either on extensive surface coverage or restrictive assumptions to resolve ambiguities. We propose a vision-based extrinsic contact localization task: with only a single RGB-D camera view of a robot workspace, identify when and where an object held by the robot contacts the rest of the environment. We show that careful task-attuned design is critical for a neural network trained in simulation to discover solutions that transfer well to a real robot. Our final approach im2contact demonstrates the promise of versatile general-purpose contact perception from vision alone, performing well for localizing various contact types (point, line, or planar; sticking, sliding, or rolling; single or multiple), and even under occlusions in its camera view. Video results can be found at: https://sites.google.com/view/im2contact/home
more » « less
Full Text Available
Learning Proofs of Motion Planning Infeasibility

https://doi.org/10.15607/RSS.2021.XVII.064

Li, Sihui; Dantam, Neil T (July 2021, Robotics: Science and Systems)
Shell, Dylan A; Toussaint, Marc (Ed.)
We present a learning-based approach to prove infeasibility of kinematic motion planning problems. Sampling-based motion planners are effective in high-dimensional spaces but are only probabilistically complete. Consequently, these planners cannot provide a definite answer if no plan exists, which is important for high-level scenarios, such as task-motion planning. We propose a combination of bidirectional sampling-based planning (such as RRT-connect) and machine learning to construct an infeasibility proof alongside the two search trees. An infeasibility proof is a closed manifold in the obstacle region of the configuration space that separates the start and goal into disconnected components of the free configuration space. We train the manifold using common machine learning techniques and then triangulate the manifold into a polytope to prove containment in the obstacle region. Under assumptions about learning hyper-parameters and robustness of configuration space optimization, the output is either an infeasibility proof or a motion plan. We demonstrate proof construction for 3-DOF and 4-DOF manipulators and show improvement over a previous algorithm.
more » « less
Full Text Available

Search for: All records